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Abstract 

The RooUnfold package provides a common framework to evaluate and use 
different unfolding algorithms, side-by-side. It currently provides implemen- 
tations or interfaces for the Iterative Bayes, Singular Value Decomposition, 
and TUnfold methods, as well as bin-by-bin and matrix inversion reference 
methods. Common tools provide covariance matrix evaluation and multi- 
dimensional unfolding. A test suite allows comparisons of the performance 
of the algorithms under different truth and measurement models. Here I out- 
line the package, the unfolding methods, and some experience of their use. 

1 RooUnfold package aims and features 

The RooUnfold package [IJ was designed to provide a framework for different unfolding algorithms. 
This approach simplifies the comparison between algorithms and has allowed common utilities to be 
written. Currently RooUnfold implements or interfaces to the Iterative Bayes EO, Singular Value De- 
composition (SVD) ||4]-[6l, TUnfold IH, bin-by-bin correction factors, and unregularized matrix inversion 
methods. 

The package is designed around a simple object-oriented approach, implemented in C++, and 
using existing ROOT [8] classes. RooUnfold defines classes for the different unfolding algorithms, 
which inherit from a common base class, and a class for the response matrix. The response matrix object 
is independent of the unfolding, so can be filled in a separate 'training' program. 

RooUnfold can be linked into a stand-alone program, run from a ROOT/CINT script, or executed 
interactively from the ROOT prompt. The response matrix can be initialized using existing histograms 
or matrices, or filled with built-in methods (these can take care of the normalization when inefficiencies 
are to be considered). The results can be returned as a histogram with errors, or a vector with full 
covariance matrix. The framework also takes care of handling multi-dimensional distributions (with 
ROOT support for 1-, 2-, and 3-dimensional (1D,2D,3D) histograms), different binning for measured 
and truth distributions, variable binning, and the option to include or exclude under- and over-flows. 
It also supports different methods for calculating the errors that can be selected with a simple switch: 
bin-by-bin errors with no correlations, the full covariance matrix from the propagation of measurement 
errors in the unfolding, or the covariance matrix calculated using Monte Carlo (MC) toys. 

All these details are handled by the framework, so do not have to be implemented for each algo- 
rithm. However different bin layouts may not produce good results for algorithms that rely on the global 
shape of the distribution (SVD). 

A toy MC test framework is provided, allowing selection of different MC probability density 
functions (PDF) and parameters, comparing different binning, and performing the unfolding with the 
different algorithms and varying the unfolding regularization parameters. Tests can be performed with 
ID, 2D, and 3D distributions. The results of a few such tests are presented in section HI 

2 C++ classes 

Figure [T] summarizes how the ROOT and RooUnfold classes are used together. The RooUnfoldResponse 
object can be constructed using a 2D response histogram (TH2D) and ID truth and measured projections 
(these are required to determine the effect of inefficiencies). Alternatively, RooUnfoldResponse can be 
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Fig. 1: The RooUnfold classes. The training truth, training measured, measured data, and unfolded distributions 
can also be given as TH2D or TH3D histograms. 



filled directly with the FilKxmeasured* ^true) and Miss(xtrue) methods, where the Miss method is 
used to count an event that was not measured and should be counted towards the inefficiency. 

The RooUnfoldResponse object can be saved to disk using the usual ROOT input/output streamers. 
This allows the easy separation in separate programs of MC training from the unfolding step. 

A RooUnfold object is constructed using a RooUnfoldResponse object and the measured data. It 
can be constructed as a RooUnfoldBayes, RooUnfoldSvd, RooUnfoldTUnfold, (etc) object, depending 
on the algorithm required. 

The results of the unfolding can be obtained as ROOT histograms (THID, TH2D, or TH3D) or 
as a ROOT vector (TVectorD) and covariance matrix (TMatrixD). The histogram will include just the 
diagonal elements of the error matrix. This should be used with care, given the significant correlations 
that can occur if there is much bin-to-bin migration. 

3 Unfolding algorithms 
3.1 Iterative Bayes' theorem 

The RooUnfoldBayes algorithm uses the method described by D'Agostini in fT]. Repeated application 
of B ayes' theorem is used to invert the response matrix. Regularization is achieved by stopping iterations 
before reaching the 'true' (but wildly fluctuating) inverse. The regularization parameter is just the number 
of iterations. In principle, this has to be tuned according to the sample statistics and binning. In practice, 
the results are fairly insensitive to the precise setting used and four iterations are usually sufficient. 

RooUnfoldBayes takes the training truth as its initial prior, rather than a flat distribution, as de- 
scribed by D'Agostini. This should not bias result once we have iterated, but could reach an optimum 
after fewer iterations. 

This implementation takes account of errors on the data sample but not, by default, uncertainties 
in the response matrix due to finite MC statistics. That calculation can be very slow, and usually the 
training sample is much larger than the data sample. 

RooUnfoldBayes does not normally do smoothing, since this has not been found to be necessary 
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and can, in principle, bias the distribution. Smoothing can be enabled with an option. 

3.2 Singular Value Decomposition 

RooUnfoldSvd provides an interface to the TSVDUnfold class implemented in ROOT by Tackmann @, 
which uses the method of Hocker and Kartvelishvili [4J. The response matrix is inverted using sin- 
gular value decomposition, which allows for a linear implementation of the unfolding algorithm. The 
normalization to the number of events is retained in order to minimize uncertainties due to the size of 
the training sample. Regularization is performed using a smooth cut-off on small singular value con- 
tributions (5^ ^i/i^i + ^k)' where the kih singular value defines the cut-off), which correspond to 
high-frequency fluctuations. 

The regularization needs to be tuned according to the distribution, binning, and sample statistics 
in order minimize the bias due to the choice of the training sample (which dominates at small k) while 
retaining small statistical fluctuations in the unfolding result (which grow at large k). 

The unfolded error matrix includes the contribution of uncertainties on the response matrix due to 
finite MC training statistics. 

3.3 TUnfold 

RooUnfoldTUnfold provides an interface to the TUnfold method implemented in ROOT by Schmitt Q. 
TUnfold performs a matrix inversion with 0-, 1-, or 2-order polynomial regularization of neighbouring 
bins. RooUnfold automatically takes care of packing 2D and 3D distributions and creating the appropri- 
ate regularization matrix required by TUnfold. 

TUnfold can automatically determine an optimal regularization parameter (r) by scanning the 
'L-curve' of logio logio ^' 

3.4 Unregularized algorithms 

Two simple algorithms, RooUnfoldBinByBin, which applies MC correction factors with no inter-bin 
migration, and RooUnfoldlnvert, which performs unregularized matrix inversion with singular value 
removal (TDecompSVD) are included for reference. These methods are not generally recommended: 
the former risks biases from the MC model, while the latter can give large bin-bin correlations and 
magnify statistical fluctuations. 

4 Examples 

Examples of toy MC tests generated by RooUnfoldTest are shown in Figs. [2]-Hl These provide a chal- 
lenging test of the procedure. Completely different training and test MC models are used: a single wide 
Gaussian PDF for training and a double Breit-Wigner for testing. In both cases these are smeared, shifted, 
and a variable inefficiency applied to produce the 'measured' distributions. 

5 Unfolding errors 

Regularization introduces inevitable correlations between bins in the unfolded distribution. To calculate 
a correct x^, one has to invert the covariance matrix: 

X ~ (^measured ~ ^true) V (Xmeasured ~ ^true (1) 

However, in many cases, the covariance matrix is poorly conditioned, which makes calculating the 
inverse problematic. Inverting a poorly conditioned matrix involves subtracting large, but very similar 
numbers, leading to significant effects due to the machine precision. 
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Fig. 2: Unfolding with the Bayes algorithm. On the left, a double Breit-Wigner PDF on a flat background (green 
curve) is used to generate a test 'truth' sample (upper histogram in blue). This is then smeared, shifted, and a 
variable inefficiency applied to produce the 'measured' distribution (lower histogram in red). Applying the Bayes 
algorithm with 4 iterations on this latter gave the unfolded result (black points), shown with errors from the diagonal 
elements of the error matrix. The bin-to-bin correlations from the error matrix are shown on the right. 




Fig. 3: Unfolding with the SVD algorithm (k = 30) on the same training and test samples as described in Fig.|2] 




Fig. 4: Unfolding with the TUnfold algorithm (r = 0.004) on the same training and test samples as described in 
Fig.[2j Here we use two measurement bins for each truth bin. 



5.1 Unfolding errors with the Bayes method 

As shown on the left-hand side of Fig. [H the uncertainties calculated by propagation of errors in the 
Bayes method were found to be significantly underestimated compared to those given by the toy MC. 
This was found to be due to an omission in the original method outlined by D' Agostini ( [2J section 4). 

The Bayes method gives the unfolded distribution ('estimated causes'), n{Ci), as the result of 
applying the unfolding matrix, Mij, to the measurements ('effects'), n(Ej): 

n(Q) = f;M.,n(E,) where M., = (2) 

P(Ej|Ci) is the x nc response matrix, = ^2^=1 P(^j\Ci) are efficiencies, and n^{Ci) is the prior 
distribution — initially arbitrary (eg. flat or MC model), but updated on subsequent iterations. 

The covariance matrix, which here we call V{fi{Ck)^n{Ci)), is calculated by error propagation 
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Fig. 5: Bayesian unfolding errors (lines) compared to toy MC RMS (points) for 1, 2, 3, and 9 iterations on the 
Fig. latest. The left-hand plot shows the errors using D'Agostini's original method, ignoring any dependence on 
previous iterations (only the Mij term in Eq. (O). The right-hand plot shows the full error propagation. 



from n(Ej), but Mij is assumed to be itself independent of n(Ej). That is only true for the first iteration. 
For subsequent iterations, no(Ci) is replaced by n{Ci) from the previous iteration, and n{Ci) depends 
onn(E^) (Eq. ©). 

To take this into account, we compute the error propagation matrix 



1^ = M, + f M..n(E,) ( - E ] (3) 

M^j) ^ ynoiCi) dn(Ej) f^^ no{Ci) dn(Ej) J 



This depends upon the matrix -^e^' which is from the previous iteration. In the first iteration, 

the second term vanishes (^^^y = 0) and we get = Mij. 

The error propagation matrix can be used to obtain the covariance matrix on the unfolded distri- 
bution 

V(HC.)MC.)) = J^^,„(E.),»(E,))^ (4) 

from the covariance matrix of the measurements, y(n(E^), n(Ej)). 

Without the new second term in Eq. (|3]), the error is underestimated if more than one iteration is 
used, but agrees well with toy MC tests if the full error propagation is used, as shown in Fig. O 

6 Status and plans 

RooUnfold was first developed in the BABAR software environment and released stand-alone in 2007. 
Since then, it has been used by physicists from many different particle physics, particle-astrophysics, 
and nuclear physics groups. Questions, suggestions, and bug reports from users have prompted new 
versions with fixes and improvements. 

Last year I started working with a small group hosted by the Helmholtz Alliance, the Unfold- 
ing Framework Project S. The project is developing unfolding experience, software, algorithms, and 
performance tests. It has adopted RooUnfold as a framework for development. 

Development and improvement of RooUnfold is continuing. In particular, determination of the 
systematic errors due to uncertainties on the response matrix, and due to correlated measurement bins 
will be added. The RooUnfold package will be incorporated into the ROOT distribution, alongside the 
existing TUnfold and TSVDUnfold classes. 
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